LEAP: A Generalization of the Landau-Vishkin Algorithm with Custom Gap Penalties

نویسندگان

  • Hongyi Xin
  • Jeremie Kim
  • Sunny Nahar
  • Can Alkan
  • Onur Mutlu
چکیده

Motivation: Approximate String Matching is a pivotal problem in the field of computer science. It serves as an integral component for many string algorithms, most notably, DNA read mapping and alignment. The improved LV algorithm proposes an improved dynamic programming strategy over the banded SmithWaterman algorithm but suffers from support of a limited selection of scoring schemes. In this paper, we propose the Leaping Toad problem, a generalization of the approximate string matching problem, as well as LEAP, a generalization of the Landau-Vishkin’s algorithm that solves the Leaping Toad problem under a broader selection of scoring schemes. Results: We benchmarked LEAP against 3 state-of-the-art approximate string matching implementations. We show that when using a bit-vectorized de Bruijn sequence based optimization, LEAP is up to 7.4x faster than the state-of-the-art bit-vector Levenshtein distance implementation and up to 32x faster than the state-of-the-art affine-gap-penalty parallel Needleman Wunsch Implementation. Availability: We provide an implementation of LEAP in C++ at github.com/CMU-SAFARI/LEAP . Contact: [email protected], [email protected] or [email protected]

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modifications of the Landau-Vishkin Algorithm Computing Longest Common Extensions via Suffix Arrays and Efficient RMQ computations

Approximate string matching is an important problem in Computer Science. The standard solution for this problem is an O(mn) running time and space dynamic programming algorithm for two strings of length m and n. Landau and Vishkin developed an algorithm which uses suffix trees for accelerating the computation along the dynamic programming table and reaching space and running time in O(nk), wher...

متن کامل

A Modification of the Landau-Vishkin Algorithm Computing Longest Common Extensions via Suffix Arrays

Approximate string matching is an essential problem in many areas related to Computer Science including biological sequence processing. The standard solution of this problem is an O(mn) running time and space dynamic programming algorithm for two strings of length m and n. Landau and Vishkin developed an algorithm which uses suffix trees for accelerating the computation along the dynamic progra...

متن کامل

A cloud-based simulated annealing algorithm for order acceptance problem with weighted tardiness penalties in permutation flow shop scheduling

Make-to-order is a production strategy in which manufacturing starts only after a customer's order is received; in other words, it is a pull-type supply chain operation since manufacturing is carried out as soon as the demand is confirmed. This paper studies the order acceptance problem with weighted tardiness penalties in permutation flow shop scheduling with MTO production strategy, the objec...

متن کامل

On a Parallel-Algorithms Method for String Matching Problems

Suux trees are the main data-structure in string matching algorithmics. There are several serial algorithms for suux tree construction which run in linear time, but the number of operations in the only parallel algorithm available, due to Apostolico, Iliopoulos, Landau, Schieber and Vishkin, is proportional to n log n. The algorithm is based on labeling substrings, similar to a classical serial...

متن کامل

Solving the Single Machine Problem with Quadratic Earliness and Tardiness Penalties

  Nowadays, scheduling problems have a considerable application in production and service systems. In this paper, we consider the scheduling of n jobs on a single machine assuming no machine idleness, non-preemptive jobs and equal process times. In many of previous researches, because of the delivery dalays and holding costs, earliness and tardiness penalties emerge in the form of linear combin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017